Date: Tue, 10 Dec 1996 21:10:26 GMT
Server: NCSA/1.4.2
Content-type: text/html
Last-modified: Tue, 24 Oct 1995 01:11:54 GMT
Content-length: 2417

<html>
<head>
<TITLE>"Improving Balanced Scheduling with Compiler Optimizations that Increase Instruction-Level Parallelism"</title>
</head>

<body>
<H!>Improving Balanced Scheduling with Compiler Optimizations that Increase Instruction-Level Parallelism</H1>
<hr>

<!WA0><!WA0><a href="http://www.cs.washington.edu/homes/jlo">
Jack L. Lo</a> and 
<!WA1><!WA1><a href="http://www.cs.washington.edu/homes/eggers">
Susan J. Eggers</a>
<hr>

Traditional list schedulers order instructions based on an optimistic
estimate of the load latency imposed by the hardware and therefore
cannot respond to variations in memory latency caused by cache hits
and misses on non-blocking architectures.  In contrast, balanced
scheduling schedules instructions based on an estimate of the amount of
instruction-level parallelism in the program.  By scheduling independent
instructions behind loads based on what the program can provide, rather
than what the implementation stipulates in the best case (i.e., a cache
hit), balanced scheduling can hide variations in memory latencies more 
effectively.

Since its success depends on the amount of instruction-level parallelism
in the code, balanced scheduling should perform even better when more
parallelism is available.  In this study, we combine balanced scheduling
with three compiler optimizations that increase instruction-level parallelism:
loop unrolling, trace scheduling and cache locality analysis.  Using code
generated for the DEC Alpha by the Multiflow compiler, we simulated a non-
blocking processor architecture that closely models the Alpha 21164.  Our
results show that balanced scheduling benefits from all three optimizations,
producing average speedups that range from 1.15 to 1.40, across the 
optimizations.  More importantly, because of its ability to tolerate
variations in load interlocks, it improves its advantage over traditional
scheduling.  Without the optimizations, balanced scheduled code is, on
average, 1.05 times faster than that generated by a traditional scheduler;
with them, its lead increases to 1.18.

<hr>

<i>
In <i>Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, California, June 1995, pages 151-162.
</i><br>
<p>
To get the PostScript file, click
<!WA2><!WA2><a href="http://www.cs.washington.edu/homes/jlo/papers/pldi95.ps">here</a>.

<hr>
<address>
<em>jlo@cs.washington.edu </em> <br>
</address>
</body>
</html>
